Fix "bundle init" when run from Databricks #1744

fjakobs · 2024-09-03T12:48:54Z

Changes

When started from a Databricks cluster use import API when creating notebooks from bundle init.

Tests

Manually tested by executing databricks bundle init from a web terminal on a Databricks cluster.

libs/template/file.go

libs/template/materialize.go

libs/template/file.go

shreyas-goenka

Thanks! This also needs some unit tests. You could use a mocked workspace client and set it in the context for the unit test. You can use t.Setenv to set the DATABRICKS_RUNTIME_VERSION env var for the tests.

See TestResolveClusterReference for a reference of a mocked workplace client.

libs/template/file.go

libs/template/materialize.go

libs/template/file.go

lennartkats-db

Thanks for looking at this! This was much less involved than I had expected. Just one remaining comment + one nit.

lennartkats-db · 2024-09-04T10:22:11Z

libs/runtime/detect.go

+
+const envDatabricksRuntimeVersion = "DATABRICKS_RUNTIME_VERSION"
+
+func RunsOnDatabricks(ctx context.Context) bool {


+1 for a common lib function. I'd expect this to also check for the existence of /Workspace though to avoid false positives. There may be all kinds of reasons why customers set the env var locally.

I'd prefer not to test for /Workspace as it makes the code using it harder to test

Yeah, testing is a pain :( We can leave it out, but then we should at least emit a debug log message whenever there's a match. There will be false positives.

Testing is not an issue as long as you don't inline these expectations.

You can store a bool on the context that stores whether or not we're running on DBR (see bundle/context.go for inspiration). In tests, you mark it as always true or always false depending on what you want to test. For the real CLI run, you run a routine that performs the actual detection and stores it in the context.

For the check itself, it can look at /proc/mounts for the workspace fuse mount, it can check for /.fuse-mounts, it can check for /databricks, and I'm sure there are a couple other stable ways to determine this.

Implemented this approach in #1889.

Besides the environment variable, it also checks for the presence of a /databricks directory.

lennartkats-db · 2024-09-04T10:26:00Z

libs/template/renderer.go

@@ -313,8 +319,7 @@ func (r *renderer) persistToDisk() error {
 		_, err := os.Stat(path)
 		if err == nil {
 			return fmt.Errorf("failed to initialize template, one or more files already exist: %s", path)
-		}
-		if err != nil && !errors.Is(err, fs.ErrNotExist) {
+		} else if !errors.Is(err, fs.ErrNotExist) {


Nit: isn't this easier to read without the else if?

shreyas-goenka

Thanks, looks good to me other than one comment. Please TAL!

libs/template/file.go

shreyas-goenka · 2024-09-04T11:24:55Z

libs/template/file.go

+	if strings.HasPrefix(path, "/Workspace/") && runtime.RunsOnDatabricks(ctx) {
+		isNotebook, _, _ := notebook.DetectWithContent(path, content)
+		return isNotebook
+	} else {


style nit: remove else block

libs/template/file.go

libs/runtime/detect.go

libs/template/file.go

pietern

Couple small comments remaining.

FWIW, in hindsight it would have been neater to use a filer for writing everything. Then we could have swapped out the writing filer for the extension-aware workspace filer and things would have worked transparently (and all writes would use the API as opposed to only notebooks).

libs/notebook/detect.go

pietern · 2024-09-09T10:29:52Z

libs/template/file_test.go

+	assert.False(t, shouldUseImportNotebook(ctx, "/Workspace/foo/bar", data))
+	assert.False(t, shouldUseImportNotebook(ctx, "/Workspace/foo/bar.ipynb", data))
+
+	t.Setenv("DATABRICKS_RUNTIME_VERSION", "14.3")


This should use env.Set.

What's the difference? I see t.Setenv used in tests. If I change it to env.Set then my tests break.

With env.Set it adds the variable to the context and passes it along in the context.

If it broke then either 1) you weren't capturing the returned ctx, or 2) the function doesn't use env.Get to access the environment variables.

libs/template/renderer.go

pietern · 2024-09-25T13:16:51Z

libs/template/file_test.go

+	assert.False(t, shouldUseImportNotebook(ctx, "/Workspace/foo/bar", data))
+	assert.False(t, shouldUseImportNotebook(ctx, "/Workspace/foo/bar.ipynb", data))
+
+	t.Setenv("DATABRICKS_RUNTIME_VERSION", "14.3")


With env.Set it adds the variable to the context and passes it along in the context.

If it broke then either 1) you weren't capturing the returned ctx, or 2) the function doesn't use env.Get to access the environment variables.

pietern · 2024-09-25T13:49:11Z

libs/runtime/detect.go

+
+const envDatabricksRuntimeVersion = "DATABRICKS_RUNTIME_VERSION"
+
+func RunsOnDatabricks(ctx context.Context) bool {


Testing is not an issue as long as you don't inline these expectations.

You can store a bool on the context that stores whether or not we're running on DBR (see bundle/context.go for inspiration). In tests, you mark it as always true or always false depending on what you want to test. For the real CLI run, you run a routine that performs the actual detection and stores it in the context.

For the check itself, it can look at /proc/mounts for the workspace fuse mount, it can check for /.fuse-mounts, it can check for /databricks, and I'm sure there are a couple other stable ways to determine this.

pietern · 2024-09-25T13:50:56Z

libs/runtime/detect.go

+
+func RunsOnDatabricks(ctx context.Context) bool {
+	value, ok := env.Lookup(ctx, envDatabricksRuntimeVersion)
+	return value != "" && ok


If you need the value to be non-empty, then you don't need to check if it exists (the OK).

You can use _ for unused variables, or use env.Get directly:

cli/libs/env/context.go

Lines 51 to 56 in a4ba0bb

// Get key from the context or the environment.

// Context has precedence.

func Get(ctx context.Context, key string) string {

v, _ := Lookup(ctx, key)

return v

}

pietern

Parking this PR.

Discussed that we should take the import approach for all files.

## Changes Whether or not the CLI is running on DBR can be detected once and stored in the command's context. By storing it in the context, it can easily be mocked for testing. This builds on the simpler approach and conversation in #1744. It unblocks testing of the DBR-specific paths while not compromising on the checks we can perform to test if the CLI is running on DBR. ## Tests * Unit tests for the new `dbr` package * New unit test for the `ConfigureWSFS` mutator

## Changes While working on the v2 of #1744, I found that: * Template initialization first copies built-in templates to a temporary directory before initializing them * Reading a template's contents goes through a `filer.Filer` but is hardcoded to a local one This change updates the interface for reading templates to be `fs.FS`. This is compatible with the `embed.FS` type for the built-in templates, so they no longer have to be copied to a temporary directory before being used. The alternative is to use a `filer.Filer` throughout, but this would have required even more plumbing, and we don't need to _read_ templates, including notebooks, from the workspace filesystem (yet?). As part of making `template.Materialize` take an `fs.FS` argument, the logic to match a given argument to a particular built-in template in the `init` command has moved to sit next to its implementation. ## Tests Existing tests pass.

## Changes When running the CLI on Databricks Runtime (DBR), use the extension-aware filer to write an instantiated template if the instance path is located in the workspace filesystem. Notebooks cannot be written through the workspace filesystem's FUSE mount. As a result, this is the only method for initializing templates that contain notebooks when running the CLI on DBR and writing to the workspace filesystem. Depends on #1910 and #1911. Supersedes #1744. ## Tests * Manually confirmed I can initialize a template with notebooks when running the CLI from the web terminal.

pietern · 2024-11-20T12:29:05Z

This was superseded by #1912.

The new approach is to use a filer.Filer for all writes so that we do not differentiate on file type.

fjakobs requested review from pietern and shreyas-goenka September 3, 2024 12:48

fjakobs commented Sep 3, 2024

View reviewed changes

libs/template/file.go Outdated Show resolved Hide resolved

fjakobs commented Sep 3, 2024

View reviewed changes

libs/template/materialize.go Outdated Show resolved Hide resolved

fjakobs commented Sep 3, 2024

View reviewed changes

libs/template/file.go Outdated Show resolved Hide resolved

shreyas-goenka reviewed Sep 3, 2024

View reviewed changes

lennartkats-db reviewed Sep 3, 2024

View reviewed changes

libs/template/file.go Outdated Show resolved Hide resolved

libs/template/file.go Outdated Show resolved Hide resolved

libs/template/file.go Outdated Show resolved Hide resolved

pietern reviewed Sep 4, 2024

View reviewed changes

libs/template/file.go Outdated Show resolved Hide resolved

libs/template/file.go Outdated Show resolved Hide resolved

libs/template/file.go Outdated Show resolved Hide resolved

fjakobs force-pushed the init_on_databricks branch 2 times, most recently from a46942f to 20ee23e Compare September 4, 2024 09:32

fjakobs requested review from shreyas-goenka, lennartkats-db and pietern September 4, 2024 09:40

lennartkats-db approved these changes Sep 4, 2024

View reviewed changes

shreyas-goenka approved these changes Sep 4, 2024

View reviewed changes

pietern reviewed Sep 4, 2024

View reviewed changes

libs/runtime/detect.go Show resolved Hide resolved

libs/template/file.go Show resolved Hide resolved

fjakobs enabled auto-merge September 4, 2024 12:09

fjakobs disabled auto-merge September 4, 2024 12:09

fjakobs added 12 commits September 4, 2024 15:15

Fix "bundle init" when run from Databricks

662234f

use env.lookup

8d78809

address some of the PR feedback

6585a75

fix tests

62b2451

add unit tests

08b6b10

Centralize code to detect if we are running on Databricks

90d6490

fixes

e51037f

Use notebook.Detect

02745de

fix nit

4c9bf79

PR feedback

694413b

add test

4cd9b17

add comment

84d1bbf

fjakobs force-pushed the init_on_databricks branch from c409696 to 84d1bbf Compare September 4, 2024 13:15

pietern reviewed Sep 9, 2024

View reviewed changes

fjakobs added 2 commits September 17, 2024 11:25

address PR feedback

6bf59ff

Merge branch 'main' into init_on_databricks

49c6ed6

fjakobs added this pull request to the merge queue Sep 17, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 17, 2024

pietern reviewed Sep 25, 2024

View reviewed changes

pietern mentioned this pull request Nov 7, 2024

Extract functionality to detect if the CLI is running on DBR #1889

Merged

pietern requested changes Nov 7, 2024

View reviewed changes

This was referenced Nov 18, 2024

Use fs.FS interface to read template #1910

Merged

Fix template initialization when running on Databricks #1912

Merged

pietern closed this Nov 20, 2024

pietern deleted the init_on_databricks branch November 20, 2024 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix "bundle init" when run from Databricks #1744

Fix "bundle init" when run from Databricks #1744

fjakobs commented Sep 3, 2024

shreyas-goenka left a comment

lennartkats-db left a comment

lennartkats-db Sep 4, 2024

fjakobs Sep 4, 2024

lennartkats-db Sep 23, 2024

pietern Sep 25, 2024

pietern Nov 7, 2024

lennartkats-db Sep 4, 2024

fjakobs Sep 4, 2024

shreyas-goenka left a comment

shreyas-goenka Sep 4, 2024

pietern left a comment

pietern Sep 9, 2024

fjakobs Sep 17, 2024

pietern Sep 25, 2024

pietern Sep 25, 2024

pietern Sep 25, 2024

pietern Sep 25, 2024

pietern left a comment

pietern commented Nov 20, 2024


		const envDatabricksRuntimeVersion = "DATABRICKS_RUNTIME_VERSION"

		func RunsOnDatabricks(ctx context.Context) bool {

	// Get key from the context or the environment.
	// Context has precedence.
	func Get(ctx context.Context, key string) string {
	v, _ := Lookup(ctx, key)
	return v
	}

Fix "bundle init" when run from Databricks #1744

Fix "bundle init" when run from Databricks #1744

Conversation

fjakobs commented Sep 3, 2024

Changes

Tests

shreyas-goenka left a comment

Choose a reason for hiding this comment

lennartkats-db left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shreyas-goenka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pietern left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pietern left a comment

Choose a reason for hiding this comment

pietern commented Nov 20, 2024